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Abstract 



Multi-armed bandits are one of the fundamental problems in se- 
quential decision theory, and are currently relevant to artificial in- 
telligence and online services. In the cases of continuum-armed and 
tree-armed bandits, we describe an algorithm obtaining near-optimal 
rates of regret, without knowledge of the reward distributions. In tree- 
armed bandits, our algorithm can work with infinite trees, and adap- 
tively combine multiple trees so as to minimise the regret. Applying 
this algorithm to continuum-armed bandits, we obtain square-root re- 
gret, without prior information, whenever the mean function satisfies 
a condition we call zooming continuity, which holds in some generality. 

1 Introduction 

Multi-armed bandits are one of the fundamental problems in sequential de- 
cision theory. Originally proposed as a model of adaptive clinical trials, 
they have more recently been applied to a number of problems in online ser- 
vices, including website optimisation, auction design, and network routing 
(Kleinberg, 2005; Bubeck and Cesa-Bianchi, 2012). 

In a multi-armed bandit problem, at each time t G N, we choose an arm 
xt G X, and observe a reward Yt G [0, 1], independent of past events, having 
distribution P{xt), with mean fi{xt). Our goal is to choose the arms xt, using 
past observations Yi, . . . , Yt-i, so as to minimise the regret, 



t=i 



where /i* := maxxex fJ-i^)- 

When the number of arms is small, this problem is now well-understood. 
However, when the number of arms is large or infinite, the problem is more 
interesting. In particular, the problem of continuum-armed bandits (CAB), 
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where the mean reward /x is a continuous function on [0, 1]^, has been the 
subject of much recent attention in the hterature. 

Early work devised algorithms for when the function /i is Lipschitz with 
respect to a known metric (Agrawal, 1995; Kleinberg, 2005). For example, 
when // is a-Lipschitz on [0,1], Kleinberg obtains (9*(^2'1-"/(2q!+i)^ regret, 
and further shows this rate is optimal over such functions. 

Auer et al. (2007) provide improved bounds on the regret by also con- 
sidering the shape of /i near its maxima. If the region where is within 
£ of its maximum has size shrinking like e^/°^~^ ^ then Auer et al. obtain 
Q*^rpi-i/{l3+2)-\^ regret, and again show this is optimal. The parameter /? > 
is the zooming dimension, and can be thought of as a measure of the diffi- 
culty of the problem. Note that /3 depends both upon the behaviour of jx 
near its maxima, and also on the smoothness /i is assumed to have. 

By generalising the notions of smoothness and zooming dimension, later 
authors obtain 0*{T'^~^/^l^^'^^) regret in a variety of contexts (Kleinberg 
et al., 2008; Cope, 2009; Bubeck et al., 2011b; Yu and Mannor, 2011). For 
example, when X = [0, 1]^, and ^ has finitely many quadratic maxima, 
Bubeck et al. give a description of the problem having zooming dimension 
/? = 0, and thus obtain 0*(\/T) regret. 

These approaches, however, all require knowledge of the shape of ^; for 
example, Bubeck et al. (2011b) require known bounds on the eigenvalues 
of the Hessian at the maxima. When such assumptions are false, these 
algorithms may achieve worse rates, or simply fail to converge. 

Methods requiring less information have been studied only recently: 
Bubeck et al. (2011a) provide an algorithm which can adapt to the Lip- 
schitz constant of /i, but only after making further smoothness assumptions; 
Munos (2011) provides an algorithm which fully adapts to the smoothness 
of fJL, but only for a simpler problem and loss function. 

Perhaps the best previous results have come from the related problem of 
tree-armed bandits (TAB), where the arms are assumed only to lie within 
some tree structure. Many algorithms for continuum-armed bandits involve 
optimising over trees on [0, 1]^; indeed, CAB can be viewed as a special-case 
of TAB, as in Section 2.3. 

Tree-armed bandits are also called bandits on taxonomies, and have 
been previously studied by several authors (Kocsis and Szepesvari, 2006; 
Coquelin and Munos, 2007; Pandey et al., 2007; Bubeck et al., 2011b; Yu and 
Mannor, 2011). As well as being applicable to continuum-armed bandits, 
these methods are also of interest in artificial intelligence, and are currently 
used by some of the top-rated computer programs in the game of go (Gelly 
et al., 2012, and references therein). 

The above methods again require knowledge of the behaviour of ^ near its 
maxima. For finite trees, however, Slivkins (2011) provides a TAB algorithm 
which adapts to the shape of ^. Slivkins' algorithm estimates the smoothness 
of n from the data, thereby obtaining O* {T^~^/^l^~^'^^) regret without prior 
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information. 

In this paper, we build on Slivkins' approach, making the following new 
contributions. Firstly, we extend Slivkins' algorithm to infinite trees, again 
obtaining 0*{T^~^^^^~^'^^) regret. We then also give lower bounds, showing 
that these rates are optimal for the problems considered. 

Secondly, we show how related techniques can be used, not only to solve 
bandit problems over a single tree, but also to choose the tree adaptively. 
In many problems, we may have more than one tree structure available to 
describe the space X. In this case, we provide an algorithm which adaptively 
combines these structures into a single tree over X, so as to minimise the 
zooming dimension /?. 

Finally, we apply this technique to continuum-armed bandits, obtaining 
0*{VT) regret, with no prior information, for a wide variety of reward func- 
tions 11. In particular, we establish this rate for any fi satisfying a condition 
we call zooming continuity. We give a full definition of this condition in 
Section 2.3, but for now remark that it includes the following. 

Example 1. Let fi : [0, 1]^ — t- M 6e continuous, with finitely many maxima 
xl, . . . and in a neighbourhood of each maximum x*i, let ^ satisfy one 
of the following as x ^ x^ . 

(i) Xi is an elliptical maximum, 

f,{x)=t,{xt)-\\A{x-xt)r{l+oil)), 

for a positive- definite matrix Ai, and power ai > 0. 

(a) Xi is a separable maximum, 

li{x) = ii{x*i) - ^^Q,i|xi (l + o(l)), 

for constants ci^i > 0, and powers ai^i > 0. 

Then jj, is zooming- continuous. 

If /i satisfies these conditions, we can then obtain 0*{VT) regret, even 
without knowing the form of the maximum or the parameters Ai, ai, ci^i 
and ai^i. We note that: 

(i) elliptical maxima include all quadratic maxima, letting Ai be the 
square root of the Hessian; 

(ii) separable maxima allow us to model functions which depend more 
strongly on some coordinates Xi than others; and 

(iii) many other similar functions are also zooming-continuous. 
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We thus establish 0*(\/T) regret, with no prior information, for a wide 
variety of reward functions fi. 

In Section 2, we give a detailed description of the problems we will solve, 
and the assumptions we will make. In Section 3, we describe our algorithms, 
give regret bounds, and discuss implementation. Finally, in Section 4, we 
give proofs of our results. 

2 Problem statement 

We now describe in detail the problems we will consider. In Section 2.1, 
we describe the problem of tree-armed bandits with multiple trees; in Sec- 
tion 2.2, state the assumptions we will make in our solution; and in Sec- 
tion 2.3, link these assumptions to continuum-armed bandits, and our zoom- 
ing continuity condition. 

2.1 Multiple-tree-armed bandits 

We will aim to solve a multi-armed bandit problem as in the introduction, 
where the arms xt lie in a product space X := HiLi -^i- Over each axis Xi, 
we will be given a tree Tf, this tree may be finite or infinite, depending on 
the axis Xi. 

Each tree 71 will have root Xi, and contain nodes U O Xi. We require 
that if U is not a leaf node, the children V oiU partition U. We also require 
that all nodes have at most g children, for a constant 5 G N. 

In addition, we will require that each path through the tree % uniquely 
identifies an element of Xi. To be precise, consider a sequence Uj of nodes in 
7i, which starts with Uq = Xi, terminates if Uj is a leaf node, and otherwise 
picks Uj-^-l from the children of Uj. We will require that each sequence Uj is 
associated with a point Xi £ X^. 

We can then define distributions vTj over Xi, in terms of the %. For each 
i = 1, . . . ,p, construct such a sequence Uj, choosing each Uj+i uniformly at 
random from the children of the Uj . Let tTj be the distribution of the point 

We then make the final requirement that, tTj almost surely, Xi lies within 
the (potentially infinite) intersection of the Uj. We also define the distribu- 
tion TT over X, given by the product of the tTj; we will return to vr later. 

We have thus described trees 7i over the axes Xi. To work with the 
product space X, we will consider sets given by products of nodes U S Ti. 
We will call such a set a box, and define the collection of boxes 

B:= f^llUi : Ui £ T^j . 
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We define the width of a box B £ B tohe 



:= maxu(x) — minufa; 



If we were given access to these widths, we could consider X as a metric 
space, and Lipschitz function on that space, as noted by Shvkins 



The difficulty of our problem would then be determined by its zooming 
dimension, defined as follows. Given S X, define the covering number 
Ns{S) to be the smallest number of boxes, of width at most 6, needed to 
cover the set S. Let 



We will define the zooming dimension /?, with zooming constant k, as 



The zooming dimension is thus a property of how variable the function 
fj, is, in regions where it is close to optimal. If we knew the widths W, we 
could solve this bandit problem with the zooming algorithm of Kleinberg 
et al. (2008), for example, obtaining regret 0*{T^~^^^^~^'^'^). However, we will 
instead attempt to do so without access to the widths, aiming to achieve 
the same rate of regret, but without prior information on /i. 

When p = 1, and our single tree is finite, such a result has already 
been shown possible by Slivkins (2011). We will first extend this result 
to multiple, infinite trees. When dealing with multiple trees, we will need 
to search not only for the regions where is large, but also for the best 
combination of boxes to describe them. 

2.2 Grid and quality conditions 

In order to proceed, we will have to make two additional assumptions on 
the structure of our problem; we will show later on that these assumptions 
are relatively benign, and cover many problems of interest. 

Our first assumption is a strengthening of the requirement that the prob- 
lem have zooming dimension f3; to describe our assumption, we will need 
some definitions. We will consider partitions Bm of X, made up of boxes 
B G B. We will say a box C is on the partition Bm, if C is a union of boxes 
in Bm', and that Bm is a refinement of Bi, if this is true for all C £ Bi. 

We will also need the concept of a grid. For i = 1, ... ,p, let Si C %, and 
let the nodes Ui £ Si be disjoint. Then the grid G is the set of boxes 



(2011). 



Xs ■■= {x e X : fi* - fi{x) < 5}. 



inf{/3 > : Ns/i2piXs) < n5~^ for all 5 > 0}. 
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We will say grids G, Q' are separated if, for any box B onQuQ\ B \s either 
Q, or on Q' . 

Finally, we will need to define the depth d{B) of a box B £ B. Given 
B = HiLi ^ii^i £ l6t d{B) be the maximum depth of any Ui in its 
corresponding tree %. 

Definition 2. We will say that n satisfies the grid condition, with zoom- 
ing dimension /3 > 0, and constants k, A > 0, if, for all m £ N, we have 
partitions Bm of X satisfying the following. 

(i) For 6m '■= 2^''"^, the region Xs^ has a cover Cm ^ Bm, with \Cm\ < 
Kdm^ , and W{B) < 5m/ ^"^P for any B G Cm- 

(a) If I < m, then Bm is a refinement of Bi. 

(Hi) Cm ^ U/ =1 Gi,m, for separated grids Gi,m- 

(iv) If B £ Cm, then d{B) < Am. 

We note that (i) is merely the statement that our problem has zooming 
dimension /?, with constant k. If p = 1, then (ii) and (iii) are trivially 
satisfiable; however, for p > 1 they are necessary to ensure we can choose 
the correct combination of trees efficiently. Finally, if the trees are finite, 
then (iv) is trivial; otherwise, it is necessary to ensure we do not descend 
our infinite trees too fast. 

We will see that for problems of interest these conditions are satisfied, 
with the same zooming dimension /3 as before. The extra requirements are 
necessary simply to rule out pathological cases, where the depth of the trees, 
or the interactions between them, might cause us to achieve poorer regret. 

The second assumption is a modification of the one made by Slivkins 
(2011), in the case p = 1. To maximise fi efficiently, without prior informa- 
tion, we will have to estimate its smoothness from the data. We will thus 
require a bound on the difficulty of this estimation; we will later see that 
many problems of interest satisfy such a bound. 

For any box B £ B, define its average reward, 

fi{B) := E,[^ I B]. 
Also define the maximum and average badness of a box B, 

6iB) := //* - min/i(x), A{B) := /x* - /x(i?). (1) 

Definition 3. We will say fi satisfies the quality condition, with quality 
7 G (0, 1) if, for any m G N, and box B on the cover Cm, there exist two 
sub-boxes Bi,B2 C B satisfying the following. 
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(i) We have some index i = nodes Ui^i,Ui^2 G Ti, and nodes 
Uj G Tj for j ^ i, such that = Uix ■ ■ ■ x Ui-i x Ui^k x ^j+i x • • • x f/p, 
fork = 1,2. 

(ii) TT{Bk I B)> J, for k = 1,2. 

(ill) - /i(i?2) > - \5{B)]- 

We thus assume that for each box B on the covers Cm, we can estimate 
the width of B, and an axis i along which varies, without having to descend 
too far down the trees %. We will see in the following that this assumption 
holds for many problems of interest. 

Our condition is similar to the one given by Slivkins (2011), but with 
a few alterations. Firstly, Slivkins' quality condition was assumed to hold 
only for boxes B containing a maximum x* , rather than all B on covers Cm- 
Our stronger condition is used to avoid a flaw in Slivkins' argument,^ and 
allows us to control the behaviour of our algorithms more precisely. 

Secondly, we have required that the boxes Bi , B2 must agree except in 
one axis i. This addition is trivial when p = 1, but for p > 1 is necessary 
to ensure we decide correctly between the multiple trees. Finally, we have 
also altered the bound in (iii). The ^ term allows for the smaller width we 
may detect when looking for change along a single axis, while the 6{B) term 
makes our condition easier to satisfy when considering sub-optimal boxes 
B. 



2.3 Zooming continuity 

In general, the above conditions are quite abstract. However, for the spe- 
cial case of continuum-armed bandits, they are both implied by a simpler 
assumption, satisfied for many typical reward functions fi, which we call 
zooming continuity. In particular, this condition holds for the functions in 
Example 1. 

First, we will need some definitions. Given any U C [0, 1]^, define its 
diameter along axis i, 

diami(C/) := sup{|xi - yi\ : x,y G U}, 

and its overall diameter, diam([/) := max^^-,^ diamj(?7). Given x G M*^, define 
its size, relative to U, to be 

II II P \Xi\ 

x rr := max ■ 



=1 diami(C/) 

Definition 4. The function f : [0, 1]^ — t- M is zooming-continuous if: 



^The proof of Slivkins' Lemma 4.4(b) incorrectly assumes that all deactivated boxes 
have been selected. 
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(i) f is continuous, with finitely many maxima; and 



(a) for any maximum x* of f, neighbourhood U of x* , and x,y,z G U, 

\fi^)-fiy)\ ^ ^ 

dmm(U),\\x-y\\^^OSUpJf{x*) - f{z)\ 

To relate this to tree-armed bandits, we will need to construct trees over 
the space [0,1]^. Define the dyadic tree T to be the tree on [0,1), where 
each node [a, h) has children [a, ^(a + b)) and [\{a + 6), b). We may instead 
consider this a tree on [0, 1], allowing nodes with upper bound 1 to contain 
the point 1. We then define the associated distribution vr in the natural way, 
so that vr is the Lesbegue measure. 

We will thus consider CAB as a TAB problem, constructing the space 
[0, 1]^' as the product of p dyadic trees. With this construction, we can show 
that any zooming-continuous ^ will satisfy our grid and quality conditions. 

Theorem 5. // /U : [0, 1]''^ — )• M is zooming- continuous, then for dyadic trees 
Ti on [0, 1], II satisfies Definitions 2 and 3, with zooming dimension /3 = 0, 
constants k, A > 0, and quality 7 G (0, 1). 



3 Algorithms and results 

We now provide solutions to these bandit problems. We will describe our 
algorithms for tree-armed bandits, noting that, as above, we can consider 
continuum-armed bandits as a special case. In Section 3.1, we give math- 
ematical descriptions of our algorithms; in Section 3.2, provide bounds on 
their regret; and in Section 3.3, discuss efficient implementation. 



3.1 Adaptive-treed bandits 

First, we define our algorithms. To begin with, we will assume we know (or 
can lower bound) the quality 7; we return later to the problem where the 
quality is unknown. 

At time t, we will select a box B €z B, and play an arm xt sampled at 
random from tt \ B. We will maintain a collection of active boxes partitioning 
X, and select B from the active boxes to maximize a quantity called the 
index, to be defined. 

If B was selected at time t, and xt G C B for a box C, we will say 
that C was hit at time t, and define Ht{C) to be the event that this occurs. 
Let nt{B) be the number of times a box B has been hit before time t, and 
if nt{B) > 0, let /Uj(i?) be the corresponding average reward. 

We next define a confidence radius rt{B), chosen so that, with high 
probability, |//t(-B) — IJ,{B)\ < rt{B). For any box B G B, define the constant 
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Fix also e G (0, 1), and let r := 4e ^ . Then define 

rt{B) := 2^/\og[p{B){T + nt{B))]/nt{B), 

The error rate e controls the accuracy of our bound; we will show that our 
results hold with probability at least 1 — e. 
Define also the index of B at time t, 

It{B) := ^,t{B) + {l + 2pK)rt{B), 

where the constant K := 8-^2/7; if nt{B) = 0, we take It{B) = rt{B) = 00. 
We may then select each arm xt from an active box B maximizing It- 

Our goal is for It{B) to be an upper bound for max^jg^ To do 

so, we must carefully maintain the set of active boxes -B, ensuring that 
W{B) < 2pKrt{B). When this bound may no longer hold, we split the box 
B into smaller boxes, ensuring also that we do so in the most efficient way. 

To proceed, we will need the width estimate 

WtiB):= max LM)-UM), 

where 

Lt{B) := f,t{B) - rt{B), Ut{B) := fxt{B) + niB), 
and M is any set for which: 

(i) if (51,52,-61,-82) S Ai, then si,S2 < t, and Bi,B2 ^ B satisfy Defini- 
tion 3(i); and 

(ii) if Bi,B2 additionally satisfy Definition 3(ii), then {t,t, Bi, B2) € A4. 

The precise form of M is unimportant, but we will later make a choice 
which allows for convenient implementation. Lt{B) and Ut{B) are thus lower 
and upper confidence bounds on the value ^J-iB); if all such bounds are valid, 
for all s < i, we have Wt{B) < W{B). 

When we begin, we will denote the root box X as active. Over time, we 
will split active boxes into sub-boxes, so as to enforce the following invariant. 

Invariant 6. Wt{B) < Krt{B) for all active boxes B. 

When splitting a box B, we must have some Bi,B2 C B maximising 
Wt{B), which differ only along index i. We then split B along axis i, replacing 
it with the boxes given by descending one level in 71- In full, our approach 
is thus described by Algorithm 1. 

When 7 is unknown, we can provide asymptotic results using the dou- 
bling trick. We approach the problem in stages: at stage m G N, we discard 
all previous observations, and run ATB for 2™ steps, setting 7 = m~^, and 
r = |7r~^m^e~^. The procedure is thus given by Algorithm 2. 
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Data: space X, trees 71, quality 7, error rate e 

activate X; 

for t = 1,2,... do 

while Invariant 6 is violated, by B = 11^=1 ^i' '^'^^ -^i' ^2 

differing along axis i do 
deactivate B; 

for Vi a child of Ui in % do 

I activate [/i x • • • x x x f/j+i x • • • x f/^; 
end 
end 

select an active box B maximising J^, breaking ties arbitrarily; 
play an arm xt drawn at random from vr | i?; 
end 

Algorithm 1: Adaptive-treed bandits (ATB) 



Data: space A, trees 7i, error rate e 
for m = 1, 2, . . . do 

I run Algorithm 1 for 2™ steps, with 7 = , r = ^ir^^m'^e^^ ; 
end 

Algorithm 2: Adaptive-treed bandits with doubling trick (ATB-D) 

Of course, this approach involves discarding some observations; we con- 
jecture that a more sophisticated algorithm could accommodate shrinking 
7 without restarting. However, ATB-D suffices to establish 0*{VT) regret 
for a wide range of problems, without prior information. 

In practice, it would likely prove more efficient to use ATB with a fixed 
value of 7, even though Definition 3 might then not apply. While such a 
procedure would be hard to control theoretically, it could still be expected 
to have good practical performance, as noted by Slivkins (2011). 

3.2 Regret bounds 

We now provide bounds on the regret of our algorithms. Let 

P = P(A,r,/3,K,A,7) 

denote the class of arm distributions P{x) on X, whose mean functions 
satisfy Definitions 2 and 3 with respect to the trees 71, with constants /3 > 0, 
K, A > 0, and 7 G (0, 1). We first establish a lower bound on the regret of 
any bandit algorithm over V. 

Theorem 7. Suppose the trees Ti have no leaf nodes, and let /3 > 0. For 
large enough k, X > 0, small enough 7, e, C ^ (0)1)) ^'^'^ '^"■2/ sequential 
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choice of arms xt, 

We can then show that, up to logarithmic factors, ATB attains this 
minimax rate. Note that for ^ > 0, we recover the bound of Shvkins (2011); 
for /? = 0, the extra log factor corrects a minor error in that result.^ 

Theorem 8. Let 7, e G (0, 1). Under ATB, for any /3 > 0, A > 0, and 

time T, 

where the constant C, = 0{'^^^KXp\og{g)), and the logarithmic term 



'log(r)/(2/3 - 1), /3>0, 
log(r)2, 13 = 0. 



We next provide a bound on the regret of ATB-D. We can show that 
ATB-D obtains near-minimax rates of regret, without knowledge of the qual- 
ity 7. 

Theorem 9. Let e G (0,1). Under ATB-D, for any /3 > 0, k, A > 0, 
7 G (0, 1), and large enough time T, 

sup P — f > - — < e. 



where the constant Q = 0{KXplog{g)), and the logarithmic term 



VT 



'log(T)V(2/^-l), /3>0, 
log(T)3, 13 = 0. 



Finally, we note that applying this result to continuum- armed bandits, 
with a zooming-continuous reward function fi, gives our 0*(\/T) bound on 
the regret. 

Corollary 10. Let e £ (0, 1). If X = [0, 1]^, and fi is zooming- continuous, 
then under ATB-D, using dyadic trees Ti over [0, 1], with probability at least 
1-e, 

Rj, = o*{Vt). 

In particular, this bound holds for the functions fi in Example 1. 



^In the proof of Slivkins' Theorem 2.3, when /? = 0, Cj is incorrectly assumed to be 



0(1). 
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3.3 Efficient implementation 

It remains to discuss the implementation of our algorithms. We will show 
that, for a careful implementation, they run in almost linear time. 

To implement ATB, we will store the active boxes 5 in a priority queue, 
sorted by their index It{B); this allows us to efficiently find a box of maximal 
index. For each active B, we will store the number of observations nt{B), 
and the current estimate fJ-t{B). 

To ensure Invariant 6 is satisfied, we will also need to store some addi- 
tional quantities. Firstly, from the definition of vr, we note there must be a 
constant P E N satisfying 

vr(C I 5) > 7 =^ d{C) < d{B) + F, 

for all boxes B,C. For each active box B, we then consider the set of sub- 
boxes of bounded depth, 

S{B) ■.= {CeB:CcB, d{C) < d{B) + F}, 

which has cardinality bounded by a fixed constant. For each C G S{B), we 
again store the summary data nt{C) and UtiC). 

Now, for each i = 1, . . . ,p, we can define an equivalence relation ~ on 
sub-boxes C,D €z S{B), with C ~ if they agree in all axes except the i- 
th. Let E denote an equivalence class with respect to any of these relations; 
then for each E, we further store the extremal lower and upper bounds, 

LtiE) = max L,(C), Ut{E) = min U,{C), 

C£E,s C£E,s 

where the extrema are taken over the times s < t when B was active. From 
these bounds, we can compute and store the width estimate, 

WtiB) = max[Lt{E) - UtiE)]. 

E 

To make a new observation, we must sample xt from a distribution tt \ B. 
For continuum-armed bandits, this is simply the uniform distribution over 
a hypercube in W, which can be easily sampled from. In other contexts, 
we will likewise assume such a sample is possible; note that we can always 
approximate it by randomly descending from B in the trees H. 

When a new observation is made, we update all stored quantities as 
necessary; if this observation causes a new box B to be activated, then any 
newly stored quantities related to B can be computed from the design points 
Xt and observations It. We then have the following result. 

Theorem 11. In the settings of Theorems 8 and 9, with probability at least 
1 — e, the algorithms described run in time 

o(riog(r)). 
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4 Proofs 



We now give proofs of our results. In Section 4.1, we prove results on 
zooming continuity; in Section 4.2, prove our lower bounds on the regret; 
in Section 4.3, give regret bounds for our algorithms; and in Section 4.4, 
bound their computational cost. 



4.1 Proofs on zooming continuity 

We begin with two lemmas, which show that the functions in Example 1 are 
zooming-continuous. 

Lemma 12. The functions [0, 1]^ — )• M given by 

f{x) = r-\\Aix-x*)\r, 

for a constant f* G M, positive- definite matrix A, and power q > 0; and 



g{x) = 9* - ^Ci|xi - X. 



p 

■t=l 



for a constants g* G M, Cj > 0, and powers ai > 0, are zooming- continuous. 

Proof. We first consider /. As A is positive definite, it is diagonalisable, with 
smallest and largest eigenvalues < A < A. Let U be any neighbourhood 
of X*, with L^-diameter d. Given x,y G U, let u = \\A{x — x*)\\/d, v = 
\\A{y - x*)\\/d. Then u,v e [0,A], and 

\u-v\ < \\A{x - y)\\/d < A||x - y\\/d < A||x - y\\^, 



so 



\f{x)-f{y)\ ^ \\\A{x - x*)r - \\A{y - y*)\n 



sup,g^|/*-/(z)| - A^d" 


as ||x — — >• 0. Thus / is zooming-continuous. 

We next consider g. Given a neighbourhood U of x* , choose the minimal 
hi > for which the product of intervals 



Y\[x* - hi,x* + hi] 



i=l 



contains U. Then for each i, U must contain a point x with Xj = x* it hi, 
and thus 

sup\g* — g{z)\ > max Cj/i"'. 
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Given x,y G U, define u,v £ [0, 1]^ by "U-i — |*^z ^ 

l\/hi, and Vi 

IVi - yl\/hi- Then 



\ I ^ oil I 

max \ Ui — Vi\ < Z\\x — y\ 



It/' 

t ± 

SO 



\g{x)-g{y)\ ^ Y.i=iCi\\xi - x^""' - \yi - y*, 



p 



l^i -^i I 



i=l 

^ 

as ||x — y\\u — )• 0. Thus g is also zooming-continuous. □ 

Lemma 13. Suppose that f : [0, 1]^ — >• R is continuous, with maxima 
xi, . . . ,xl- If, for each maximum x^*, 

f{x^)-f{x) = {gi{xt)-gi{x)){l + o{l)) 

as X ^ X*, for a zooming- continuous function gi with maximum x^, then f 
is zooming- continuous. 

Proof. Let x^* be a maximum of /, and gi the associated zooming-continuous 
function. Then given e > 0, for 5 > smah enough, we have the fohowing. 
For any neighbourhood U of x\, and x,y, z £ U, with diam(C/), ||a; — < 6, 

\gi{x) - gi{y)\ 



For 5 small enough, we additionally have 

\fix) - fiy)\ < \9i{^) - miy)\ + e{2gi{x*) - gi{x) - gi{y)) 



supj/(x;) - f{z)\ snv,\gi{x*i) - 9i{z)\ - £{gi{x*i) - gi{z)) 

3e 

< . 

- l-e 

Thus / is also zooming-continuous. □ 

As a corollary of Lemmas 12 and 13, we may deduce the claim in Ex- 
ample 1. We next establish that zooming continuity implies our grid and 
quality conditions. 

Proof of Theorem 5. We first consider Definition 2. Let /i have maxima 
X*, . . . ,x\. Then as is continuous, and [0, 1]^ compact, the sets must 
be the the union of neighbourhoods Xi^s of each x^*, with diam(Xi^5) — )• as 
5^0. 
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For m G N, let Ui^m '■= Xi,5m - For fixed /, define powers /cj G N by 

2-^' <diam,([/i,„) <2-'=»-i. 

Then for any A; G N, we can cover the set Ui-m with a grid Qi^m-, composed 
of boxes with sides of length 2~^^^~^^^ along axis i. 

Choosing this grid to be as small as possible, we will then need at most 
(2*^ + 1)P boxes; doing so for each / will thus require at most 

K := L{2^ + If 

boxes in total. Furthermore, since diam(C/i^m) — )• as m — )• oo, for m large 
enough, these grids will be separated. 

Define the cover Cm to be the minimal set of boxes B, in grids Qi^mi 
necessary to cover X^^. As /i is zooming-continuous, for k and m large 
enough, the width of each box B G Cm will be at most 5/14, where 

5 := fi* — inf {^{x) : x £ B £ Cm}- 

Pick X G -B G Cm for which fi* — > (13/14p)(5. Then, since B must 
contain a point y G Xs^ , we have 

Sm>l^*- Ky) > - Kx) - 5/14 = (6/7)5. 

The boxes B G Cm thus are of width at most 5m/^2p. 

We note that, by construction, the grids Gi^m get finer with m; we can 
therefore easily construct partitions Bm of X, which contain the grids Qi^m, 
and refine as m increases. We have thus constructed, for m larger than some 
M, partitions Bm, covers Cm, and grids Gi^m, satisfying Definitions 2(i)-(iii), 
with zooming dimension /3 = 0. 

To show Definition 2(iv), we note that for each I, x^ £ B £ Cm, for some 
box i? with 

5{B) = W{B)<5m/l2p<5m+i. 

Hence B (IJJi, 

^m+i, and 

diamj([/;,m+i) > 2'^ d\aiiii{Ui^m)- 

Inductively, we may conclude that the parameters /cj, and thus the depths 
of the boxes B G Cm, grow at most linearly with m. 

For m < M, we note that, as fj, is continuous, and [0, 1]^ compact, /i 
must be uniformly continuous. We thus have a single grid G, of finite size, 
containing boxes of width at most 5m/12p. If we then set, for m < M, 
Bm = Cm = Gm = Q, we will Satisfy Definition 2 for all m, potentially after 
increasing k, A and k. 

It remains to consider Definition 3. Given B G Cm, for m large, we 
may apply Definition 4(ii) to the neighbourhood of x^*. As above. 
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we obtain a cover C of B, given by boxes C with W{C) < 6{B) /32, and 
d{C) < d{B) + r, for a constant T G N. 

If W{B) < j5{B), Definition 3 trivially holds. Otherwise, the boxes 
C G C must then satisfy 

W{C) < lW{B). 
Choose two such boxes Co,Cp, to additionally satisfy 

inf fi{x) < inf fi{x), sup fi{x) > sup/z(x). 
xeCo x&B x^Cp xeB 

Then 

fi{Cp) - fi{Co) > lW{B) > W{B) - \6{B). 

Further choose a sequence of boxes Cq, . . . ,Cp G C, with the property 
that each box Cj agrees with Cj_i except along axis i. We must then have 
some Ci,Ci-i with 

f^{Ci)-f,{Q-,)>l[W{B)-l6{B)]. 

As d{Ci), d{Ci-i) < d{B) + F, these two boxes satisfy the conditions of 
Definition 3, for some fixed q S (0, 1). 

We have thus established Definition 3 for all B G Cm, when m is greater 
than some M. When m < M, we note that by uniform continuity, the above 
argument will still hold, potentially after increasing F. □ 



4.2 Proofs of lower bounds 

We now prove our lower bounds on the regret. To begin, we will need a lower 
bound on the regret in multi-armed bandits, as proved by Bubeck (2010). 

Lemma 14. Let ^ < q < p < |, and set A = p — q. Consider a multi-armed 
bandit problem, with arms X = {1, . . . ,K}. Let the rewards be Bernoulli, 
with some arm k G X having mean p, and all other arms mean q. Then for 
any sequential choice of arms xt, 



K ^ Rt 1 3 /rA2 

Proof. The result follows similarly to Bubeck's Lemma 2.2, noting that the 
Kullback-Leibler divergence of Bernoulli random variables, with means p 
and q, is bounded by the divergence. 

We may now prove the tree-armed bandit lower bound. 
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Proof of Theorem 7. We proceed by a reduction to a multi-armed bandit 
problem. We will first assume that /3 > 0, p = 1, and every node in the 
single tree 7i has at least three children; we return to the other cases later. 

Set Uofi := X, and for each j = 0, 1, . . . and k = 0, . . . ,2^ — 1, pick two 
children Uj+i^2ki t^j+i,2fc+i of Uj^k in 71. Then define functions 

//j-fc(x) := M 1 + (1-q)^q'1 [x G |J ^'-M ^ Uj,k) \ , 

\ 1=0 \ k=0 J J 

where the constant a := 2~^^^ . 

We first show that the functions fij k satisfy Definition 2. Define jm £ No 

by 

and a constant J G No by 

a*^ > l/4p > 0-^+^ 
Let ;Sm be given by the collection of all nodes U of depth + J, and set 

2^-1 1 



k=0 



We have that: 

(i) each Cm covers Xs^ , with 

and 

W{B) < io^^+'^+i < 6m/l2p 

for S G Cm; 

(ii) the sets Bm partition X, and get finer as m increases; 

(iii) each Qm is a grid over X; and 

(iv) if B G Cm, d{B) = im + J < /3(m + 1 + log2(p)). 

The /ij^fc thus satisfy Definition 2, with zooming dimension /3, and constants 
K = \g-\ApY'-^^9, A = /3(2 + log2(p)). 

We next show that these functions satisfy Definition 3. If -B G i3 is not 
equal to some Ui^k, then W{B) = 0, so Definition 3 is trivially satisfied. Now 
suppose B = Ui^k, and let L = [2/3] . 
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We first assume that / + L < j, so we have boxes Bi,B2 Q B, with 
d{BiS) < I + L, satisfying 

/i > i(2 - a'+^) on Si, = i(2 - q') on ^2- 

Since W{B) < ^a', we conclude that 

fi{Bi) - fi{B2) > (1 - a^)Ty(S) > lW{B) > W{B) - 

lil + L > j, then we likewise have boxes i?2 Q B, with d{Bk) < l + L, 
satisfying 

fi{Bi)-fi{B2) = W{B). 

In either case, since 

TT{Bk I B) > g~^, 

we conclude that Definition 3 is satisfied, with quality 7 = g~^'^l^~^ . 

Now define a family of distributions Pj^k on the arms, letting Pj^k{x) 
be the Bernoulli distribution with mean ^j^i.(x). Prom the above, we know 
that Pj^k £ P- It thus suffices to prove that, for j chosen in terms of T, any 
sequential choice of arms Xt, and > small enough. 




-l/(/3+2)^ 



> e, 



where Pj^^ denotes probability under the arm distributions Pj^k- 

Letting Uj = IJI^Sq Uj^k, if xt Uj, the reward Yt has the same dis- 
tribution under every P^, and regret at least as large, stochastically, as for 
any arm xt G Uj. We may thus assume that all chosen arms xt G Uj. This 
problem is then a multi-armed bandit problem, with 2^ Bernoulli arms cor- 
responding to the Uj^k- 

We may thus apply Lemma 14, with K = 2^ , A = ^2"^^^ , and j G N 
satisfying 

2i{i+2//3) > > 2(j'-i)(i+2//5). 

We then have 



so the lemma implies that 



3 /rA2 1 

2V K - 4' 



2J-1 Rt . 
sup Efc— > 

fc=o ^ 



As Rt/TA G [0, 1], we obtain 



2.-1 / Rt l\ 1 
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The result then follows since 



12A > 2-V/3(4T)-V(/3+2). 



It remains to consider the other cases. If /3 = 0, we apply a similar 
argument to the two functions ^i^, ^i^i, letting a = l/\/3T. In this case, 
Definitions 2 and 3 are trivially satisfied, with 



the result then follows as before. If p > 1, we can define the fj,j^k to depend 
only on the first coordinate of x G X, and then continue as above; the results 
then follow similarly. 

Finally, if 71 has nodes with only two children, we can replace it with 
a new tree T^, containing all nodes in 71 of even depth, and where [/ is a 
child of V in 7i if it is a grandchild of V in 71. Then every node of T{ has 
at least four children, and we may argue as above. □ 

4.3 Proofs of regret bounds 

We next focus on the regret of ATB. We begin by describing an event which 
occurs with high probability, on which we can control the behaviour of our 
algorithm. We may then proceed to argue deterministically, conditional on 
this event. 

Definition 15. We will say an execution of ATB is clean if, for any B £ B, 

and t gN, the following holds. 



(ii) For each B on some cover Cm, m G N, fix two boxes -81,-82 C B, 



satisfying the conditions of Definition 3. Then if rt{B) < -y/7/6 




Lemma 16. In the setting of Theorem 8, an execution of ATB is clean with 
probability at least 1 — e. 

Proof. We will consider separately the two conditions in Definition 15, and 
show that the probability of each one failing is at most = 2r^^. 

(i) For any B G B and i G N, define the random variables 



{U £Ti:U has depth 1}; 



(i) Ifnt{B) > 0, \^xt{B)-^l{B)\ < rt{B). 






nt{B)=i, Ht{B), 
nt{B) < i for all t, 



so the martingale Mn{B) 



Y27=i^i(^) satisfies 



/it(S)-/i(i?) = M„,(B)(S)M(S). 
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If Definition 15 (i) does not hold, we must have 



\MniB)\ > 2Vnlog[p(S)(r + n)], (3) 

for some B & B and n G N. For a single B and n, by Azuma's inequal- 
ity, the event (3) has probability at most 2[p{B){t + 

Now, the number of boxes B of depth d is bounded by 

yj=0 J 

so by a union bound, the probability that any event (3) occurs is at 
most 

oo oo oo oo 

n=l d=0 n=r+l d=l 

(ii) Given B, Bi, and f?2i we likewise define the random variables 

Z (By^l^iHtiBk))-7TiBk\B), nt{B) = i, Ht{B), 
' \0, nt{B) < i for all t, 

so the martingale Mn^k{B) '■= Yl'i=i ^i,k{B) satisfies 

nt{Bk) - nt{B)7T{Bk \ B) = M^^^b)AB)- 
If Definition 15(ii) does not hold, we must have 

nt{Bk) < ^-fnt{B), 
and thus, by Definition 3(ii), 

Mn,kiB) < -\m:{Bk \ B), (4) 

for some B £ B, > 2Alog[p{B){T + n)], and A: = 1,2. For a single 
B, n and k, by Freedman's inequality, the event (4) has probability at 
most [p{B)(t + n)]~'^. By a similar argument, the probability that any 
event (4) occurs is thus at most 2t~^. □ 

To prove our regret bound, we will begin by showing that ATB only 
activates boxes on the covers Cm- Once this result is proved, we will be able 
to very precisely control the behaviour of our algorithm. 

We will say a box B £ B was active at time t G N, if it was active at 
that time after ensuring Invariant 6. Likewise, we will say B was selected 
at time t, if xt was drawn from n \ B. 

We will also say B was activated at time t, if it was made active at that 
time, and let B the box, active at time t — 1, whose deactivation lead to B 
being activated. Note we will consider the box X to be active at the start 
of the algorithm, and thus not activated at any time t €N. 
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Lemma 17. Let P{t) be the statement that, if B was activated at time s < t, 
and 2"*" < 5{B) < 2-*^^™, then B is on the cover Cm- Then in the setting of 
Theorem, 8, on a clean execution, for any B € B and t G N; 

(i) if B was active at timet, and P{t) holds, ^W{B) — \^5{B) < 2pKrt{B); 

(ii) if B was selected at time t, and P{t) holds, A{B) < 2pKrt{B); 

(Hi) if B was activated at time t, and P{t — 1) holds, 5{B) < 5pKrt{B); 
and 

(iv) P{t) holds for all t £ N. 

Proof. We will first establish (i)-(iii), and then use these results to prove 
(iv) inductively. 

(i) B ^ X, then B must have been activated at some time s < t. If 
6{B) = 0, then as 

W{B) < 5{B) < 6{B), 

the result is trivial; otherwise, by P{t), B must be on some cover Cm- 
If instead B = X, then B is on the cover Ci . In either case, we may 
therefore let Bi, B2 C B he the boxes fixed in Definition 15(ii). 

Suppose 

IW{B) - 16{B) > 2pKrt{B). (5) 
Since W{B) < 1, we must have rt{B) < a/t/^, so 

Lt(Bi) - UtiB2) > KBi) - KB2) - 2rt{Bi) - 2rt{B2) 
>l[^{B)-\5m-\Kr,{B) 

> Krt{B), 

where the first inequality follows from Definition 15 (i), the second from 
Definitions 3(iii) and 15(ii), and the third from (5). As this contradicts 
Invariant 6, we conclude 

^W{B) - 16{B) < 2pKrt{B). 

(ii) Pick X* £ X with 

^,{x*) = IX* , (6) 

and let B* £ B he an active box containing x* at time t. By (i), we 
have 

W{B*) = ^W{B*) - \5{B*) < 2pKrt{B*), (7) 

so 

ItiB) > It{B*) > ii{B*) + WiB*) > fi*, 
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where the first inequahty fohows since B was selected, the second 
from (7) and Definition 15(i), and the third from (6). Again using 
Definition 15(i), we conclude 

A{B) < 2pKrt{B). 

(iii) Since B was active at time t — 1, but not at time t, it must have been 
selected at time t — 1 , so 

ntiB) = nt^i{B) + 1. (8) 

If nt{B) < 8, the result follows from S{B) < 1. Otherwise, we have 

6{B) < A{B) + W{B) < lpKrt-i{B) + i<5(5), (9) 

where the first inequality follows from (1), and the second from (i) and 
(ii). We thus obtain 

6{B) < ^pKrt-i{B) < 5pKrt{B), 

where the first inequality follows from (9), and the second from (8). 

(iv) We argue by induction on t. As no box can be activated at time t = 1, 
P(l) trivially holds. Suppose P{t — 1) holds, but P{t) does not. Then 
we have a box B, activated at time t, which is the first-activated box 
to not satisfy the condition P{t). 

Let C be the box deactivated when B was activated, with C = 11^=1 
Vj € Tj, and let i be the axis along which C was split. As C was de- 
activated at time t, we have times si,S2 < t, and boxes -61,-62 C C, 
differing only in axis i, for which 

Ls,{Bi)-Us,iB2)>KrtiC). (10) 

We will show C is on the partition Bm, where m G N satisfies 2"™" < 
S{B) < 2^~™. If C = X, then C is trivially on Bm- Otherwise, since 
C^B, 

6{C) > 6{B) > 2-"^. 

Thus as P{t) holds for C, C lies on some partition Bi, I < m. By 
Definition 2(ii), C therefore lies on the partition Bm- 

In either case, since C <Z B, 6{C) < 5{B) < 21"'". Then as Bm is a 
partition, C must further lie on the cover Cm] by Definition 2(iii), C 
must in fact lie on one grid Gi^m- Let this grid be generated by subsets 
Sj of each tree Tj - 
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We will suppose that Vi £ Si. Then, as the members of Si are disjoint, 
any two points xi,X2 € C, agreeing except in the i-th coordinate, must 
lie within a single member of the cover Cm- We therefore have 



IM^i)-M^2)|<42— (11) 

Let Ui^k, k = 1,2, and Uj, j ^ i, be defined in terms of Bi,B2, as 
in Definition 3(i). Further let ttq denote the product distribution of 
T^j I Uj, j / i, let vffc denote the distribution tTj | C/j fc, and for x S X, 
let x-i denote the vector (xi, . . . , Xj+i, . . . , Xp). We then obtain 

Ls,{Bi) - Us,{B2) < fi{Bi) - fi{B2) 




IJl{x) d{'Ki - TT2){Xi) d'Ko{X-i) 



= 1^2-- < ^<5(5) < iKniB) < iKniC), 

where the first inequality follows from Definition 15(i), the second from 
(11), the third since S{B) > 2~"^, the fourth from (iii), and the fifth 
since nt{C) < nt{B). 

As this contradicts (10), we conclude that Vi Si. Thus, as C is on 
the grid Gi^m, B is formed from C by splitting Vi, and the children of 
Vi are disjoint, B must also be on the grid Gi^m- A.s B C C, it must 
further be on the cover Cm, contradicting our choice oi B. □ 

To prove our regret bound, we must now show ATB selects arms Xt from 
good boxes B. On a clean execution, the following lemma bounds both the 
number of boxes we select, and the number of times each bad box is selected. 
Define Nt{B) to be the number of times B has been selected before time t, 
and define also the quantities 

m—l 

Cm:=l + Y.^2'^^'. (12) 

1=0 

Lemma 18. In the setting of Theorem 8, on a clean execution, for any 
B G B: 

(i) at most k2(™-i)'^+i activated boxes B .satisfy 2"™ < 5{B) < 2^-™; 

(ii) at most Cm selected boxes B satisfy A(B) > 2~"^; 

(iii) if B was selected, and S{B) > 2~™, then d{B) < Am; and 

(iv) ifA{B) > 2-"^, then for any t G N, 

Nt{B)I\{B) < 0(7-iAplog(<7))2™(m + log(T + t)). 
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Proof. 



(i) By Lemma 17(iv), if an activated box B satisfies 2"™ < 6{B) < 2^"™, 
tlien B is on the cover Cm ■ We proceed by counting the activated boxes 
B on Cm- 

The set of all activated boxes forms a tree, with root X, and B a child 
of C if S was activated when C was deactivated. The set of activated 
boxes on the cover Cm thus forms a forest, where each internal node has 
at least two children, and by Definition 2(i), there are at most 
leaves. We conclude there are at most 

^2('»-i)/3+i such boxes. 

(ii) B ^ X was selected, it must have been activated, so we have a box 
B. Then as 

6{B) > 6{B) > A{B) > T"^, 

we conclude 2"^ < b(B) < 2^"' for some I £ N, 1 < I < m. By (i), there 
are at most X^I^i ^2^'"^)'^"'"^ such B; the result follows after including 
B = X. 

(iii) As in (i) and (ii), B must be on a cover C/, for some / < m; the result 
follows by Definition 2(iv). 

(iv) If Nt{B) > 0, suppose B was last selected at time s < t. Then 

Nt{B) = Ns{B) + 1 
<ns{B) + l 

< IQK^A{B)-^ log[p{B){T + n,iB))] + 1, 
<Oij-')AiBr^\og[p{B){T + t)], 

where the first inequality follows from the definitions, the second from 
Lemmas 17(ii) and 17(iv), and the third since n.s{B) < t. Now, since 

S{B) > A{B) > 2-"^, 

by (iii) we also have d{B) < Am, so 

\og[p{B)] < p{Xm + 1) log(5). 

The result follows. □ 

With this lemma, we may now prove our bound on the regret of ATB. 

Proof of Theorem 8. Let the arm distributions P{x) be given by any P G V, 
and let niT € Z satisfy 

2mT-2 < (r/Cr?T)^/^^+^^ < 2'"^-\ (13) 
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for logarithmic term r]T, and large enough constant as in the statement 
of the theorem. Then 

rriT < 0(log(T)), (14) 
and by Lemma 16 we may assume the execution is clean, so 



Rt= Nt{B)A{B) 



B selected 
rriT 

Nt{b)a{b) + r2~™^ 

m=l B selected 

2-'"<A{B)<2i-'" 

rriT 



< 0{j-^Xplog{g)) Y Cm2"'(.m + log(T + T)) + T2-^^ 

m=l 

<T{T/Cr,T)-^/(^+^\ 

where the first inequality follows since we can select boxes with A{B) < 
2-mT most T times, the second from Lemmas 18(ii) and 18(iv), the third 
from (12) and (14), and the fourth from (13). Note the higher log power 
when /3 = is necessary to control the size of the Cm terms. □ 

Next, we establish the performance of ATB-D. 

Proof of Theorem 9. Since 7 — )■ as m — t- oo, we have some stage M beyond 
which (J, satisfies Definition 3 with quality 7, and we can apply Theorem 8. 
The probability that our regret bound (2) does not hold, for some stage 
m > M, is thus at most 

00 

67r^^e Y^ ^ ^• 

m=M 

Suppose (2) holds for all stages m > M. If T satisfies 

2™<r+l<2"+^ (15) 

for some m, the T-th observation will be taken in stage m. For T large 
enough that m > M, we thus have 

M-l m 

R^< ^2' + j;0*(2'(i-V(/5+2))) 

1=1 l=M 

< 2^-^ + 0*(2™(i-i/('^+2))) 

< o*(r^-v{/3+2))^ 

where the first inequality follows from (2), the second directly, and the third 
from (15). The result in the statement of the theorem can then be seen to 
hold by considering the terms and tjt in Theorem 8. □ 
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4.4 Proofs of computational cost 

Finally, we prove bounds on the complexity of our algorithms. We begin 
with a lemma bounding the depth of active boxes. 

Lemma 19. In the setting of Theorem 8, on a clean execution, if a box 
B £ B is active at time t £N, then d{B) = 0{\og{t)). 

Proof, li B = X, the result is trivial. Otherwise, let C be the box deacti- 
vated when B was activated, and s < t the time when C was deactivated. 
Then 

S{C) > W{C) > Ws{C) > Krs{C) > t'^l^, 

where the first inequality follows from (1), the second from Definition 15(i), 
the third since C was deactivated at time s, and the fourth since ns{C) < t. 
Thus by Lemma 18(iii), 

d{C) = 0{log{t)). 

The result follows since d{B) < d{C) + 1. □ 

Proof of Theorem 11. We will begin with ATB, and divide the computa- 
tional cost into four parts: 

(i) the cost of updating priorities in the priority queue; 

(ii) the cost of insertions to and deletions from the priority queue; 

(iii) the costs otherwise associated with activating new boxes B\ and 

(iv) all other costs. 

For part (i), we note that all active boxes B must contain at least one 
design point x^, so at time t, there can be at most t active boxes. Each 
turn, we update the priority of one active box with cost 0(log(T)), so 
the total cost of updates is 

o(riog(r)). 

For part (ii), we note that the boxes activated by time T form a tree, as 
in the proof of Lemma 18(i). As the leaves of the tree are the boxes active 
at time T, there are at most T leaves. Since all internal nodes have at least 
two children, there are thus 0(T) boxes in total. Then as each box can have 
been inserted to and deleted from the priority queue at most once, with cost 
0(log(r)), the total cost of insertions and deletions is 

o(riog(r)). 
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For part (iii), we note that activating a box B at time t requires the 
computation of a constant number of stored quantities, each of which takes 
time hnear in nt{B). The total cost of activations is thus 

T 

0{nrm = Y. E 

B activated by time T t=l B hit by xt, and 

activated by time T 

Now, by Lemma 16 we may assume the execution is clean, and so apply 
Lemma 19. Hence each design point xt can have hit at most 0(log(T)) 
boxes active by time T. The total cost of activations is thus 

o(riog(r)). 

For part (iv), we note that at time t, the remaining costs are those of 
updating stored quantities related to the selected box B. As the number of 
such quantities is bounded, and each update can be performed in constant 
time, the total remaining cost is 

0{T). 

For ATB, in the setting of Theorem 8, with probability at least 1 — e, 
the computational cost of is thus 0(nog(T)). For ATB-D, as in the proof 
of Theorem 9, we may assume that this bound holds for all stages m > M, 
where M G N is fixed. 

For stages m < M, we must provide an alternate bound in part (iii). 
Since we showed in part (ii) that there are 0{T) boxes activated by time T, 
the computational cost of part (iii), and thus ATB as a whole, is at most 

o(r2). 

In the setting of Theorem 9, if T satisfies 

2™ < T + 1 < 2""+^ 

for some m, the T-th observation will be taken in stage m. For T large 
enough that m > M, with probability at least 1 — e, the computational cost 
is then 

M—l m 

J2 0(2^') + 0{12^) = Oiml"") = o(riog(r)). □ 

1=1 l=M 
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